Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add schema to the default location root #339

Closed
wants to merge 8 commits into from

Conversation

dan1elt0m
Copy link

@dan1elt0m dan1elt0m commented Apr 22, 2022

resolves #239

Description

Adds schema to the default location root

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-spark next" section.

@cla-bot cla-bot bot added the cla:yes label Apr 22, 2022
@dan1elt0m
Copy link
Author

@JCZuurmond

@JCZuurmond
Copy link
Collaborator

@dan1elt0m LGTM! Could you add the PR to the change log entry too? There are more entries having both the issue and the PR.

@jtcohen6 : We should communicate well about this fix. It may cause confusing: new tables created with this fix have the location which includes the schema, while existing tables have the location without the schema. If users do not like this change they keep the old behavior by adding the following macro to their project:

{% macro spark__location_clause() %}
  {%- set location_root = config.get('location_root', validator=validation.any[basestring]) -%}
  {%- set identifier = model['alias'] -%}
  {%- if location_root is not none %}
    location '{{ location_root }}/{{ identifier }}'
  {%- endif %}
{%- endmacro -%}

Still, I think it's a great change! It's even more confusing when multiple people work on the same dbt project expecting to create separate tables in their user schema, but actually the tables in different schema point to the same file location, causing unexpected, weird issues.

@dan1elt0m
Copy link
Author

Could you add the PR to the change log entry too?

Done

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 3, 2022

@dan1elt0m Thanks for the PR!

@JCZuurmond I was just going to ask the same question. This is technically a breaking change, but it's really how this should have worked all along.

I'd like to include this change in v1.2 and include a clear upgrade notice to exactly this effect. Anyone who wants to preserve the old behavior has a clear mechanism for doing so.

@jtcohen6 jtcohen6 added the ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering label Jun 15, 2022
@McKnight-42 McKnight-42 reopened this Jun 15, 2022
@McKnight-42 McKnight-42 requested a review from jtcohen6 June 15, 2022 15:14
@JCZuurmond
Copy link
Collaborator

@jtcohen6 : What is the plan for this PR?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Aug 9, 2022

@JCZuurmond Let's do it ✅ it's a good change, and a sensible default.

We're in the process of migrating our adapter repos to use the same changelog system (changie) that dbt-core has been using for several months. Once that's ready, let's update the changelog here to call this out as a (potentially) breaking change, for folks who've built an expectation around the old default S3 paths.

@McKnight-42
Copy link
Contributor

@dan1elt0m @JCZuurmond so sorry for the late update, if either of you are still free to work on this the changie swap over has been implemented some instructions can be found in the contributing guide here: https://github.com/dbt-labs/dbt-spark/blob/main/CONTRIBUTING.md#adding-changelog-entry. if you have any other questions or are not able to carry on working on this atm please let me know and we might be able to get it on our end.

@JCZuurmond
Copy link
Collaborator

@dan1elt0m Could you do the changie thing?

@cla-bot
Copy link

cla-bot bot commented Oct 31, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Daniel Tom.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot cla-bot bot removed the cla:yes label Oct 31, 2022
@cla-bot cla-bot bot added the cla:yes label Oct 31, 2022
@dan1elt0m
Copy link
Author

@McKnight-42 @JCZuurmond: I added the changie log

@McKnight-42 McKnight-42 self-requested a review November 10, 2022 17:06
@McKnight-42 McKnight-42 reopened this Nov 10, 2022
@JCZuurmond
Copy link
Collaborator

@McKnight-42 : Wat is the plan with this PR?

@JCZuurmond
Copy link
Collaborator

@McKnight-42 : What is the plan with this PR? I see you opened it again

{%- set identifier = model['alias'] -%}
{%- if location_root is not none %}
location '{{ location_root }}/{{ identifier }}'
location '{{ location_root }}/{{ schema }}/{{ identifier }}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interesting way to make this non-breaking would be to make this configurable:

Suggested change
location '{{ location_root }}/{{ schema }}/{{ identifier }}'
{%- set use_schema_in_location = config.get('use_schema_in_location_root') -%}
{%- if location_root is not none %}
{%- if use_schema_in_location is not none %}
location '{{ location_root }}/{{ schema }}/{{ identifier }}'
{% else %}
location '{{ location_root }}/{{ identifier }}'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we also update config.get() to stuff the schema into the location root if it's present?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JCZuurmond do you have an opinion on that?

Copy link
Collaborator

@JCZuurmond JCZuurmond Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would like to have the location with the schema enabled by default.

Without the schema, users may overwrite each other's tables even when they have them in separate schemas because they are overwriting the underlying data via the location. This happened to colleagues of mine, which is a hard bug to track down.

I discussed the non-breaking/breaking change with @jtcohen6. I recall we were ok with it, but I do not see this conversation back in the issue maybe we had the conversation over Slack.

In any case, the change is kinda of a breaking change. The location is added when a user creates a new table; either because the data model is new (in the target schema), or when the materialization is table, or when using --full-refresh. Existing incremental tables are not affected and for the other options the tables are recreated anyway. Technically speaking, you could argue this is not (really) a breaking change.

My preference would be to enable the location with schema by default AND communicate clearly what is changed and why. It will be confusing if users do not know/understand this change and they see that some of the (older) tables have a location without schema and others have a location with schema. However, it will be even more confusing when multiple developers work together within the same warehouse and they are interfering with each other's tables unknowingly.

Copy link
Contributor

github-actions bot commented Nov 6, 2023

This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 6, 2023
@JCZuurmond
Copy link
Collaborator

@Fleid or @colin-rogers-dbt : What will we do with this PR? I still think it is an important change, but also tricky due to the change in location

Copy link
Contributor

github-actions bot commented Aug 7, 2024

This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Aug 7, 2024
Copy link
Contributor

Although we are closing this PR as stale, it can still be reopened to continue development. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add schema to the default location root
7 participants